Genome-wide identification and functional analysis of class III peroxidases in Gossypium hirsutum

Class III peroxidase (PRX) genes play essential roles in various processes, such as auxin catabolism, removal of H2O2, crosslinking cell wall components, and response to biotic and abiotic stresses. In this study, we identified 166, 78 and 89 PRX genes from G. hirsutum, G. arboretum and G. raimondii, respectively. These PRX genes were classified into seven subfamilies based on phylogenetic tree analysis and the classification of PRX genes in Arabidopsis. Segmental duplication and purifying selection were the major factors driving the evolution of GhPRXs. GO and KEGG enrichment analysis revealed that GhPRX genes were mainly associated with responding to oxidative stresses, peroxidase activities and phenylpropanoid biosynthesis pathways. Transcriptome data analysis showed that GhPRX genes expression were significantly different in microspore development between the sterility line-JinA and the maintainer line MB177. We confirmed the up-regulation of GhPRX107 and down-regulation of GhPRX128 in the sterile line compared to its maintainer line using qRT-PCR, suggesting their roles in pollen fertility. In addition, silencing GhPRX107 in cotton showed a significant decrease of the reactive oxygen species (ROS) levels of microsporocyte stage anthers compared to control. Overexpressing GhPRX107 in Arabidopsis significantly increased the ROS levels of anthers compared to wild type. In conclusion, we identified GhPRX107 as a determinant of ROS levels in anther. This work sets a foundation for PRX studies in pollen development.


INTRODUCTION
Peroxidase (EC 1.11.1.X) family is an important enzyme family which can catalyze oxidoreduction (Dunford & Stillman, 1976). These enzymes have been widely found in animals, plants, and microorganisms. Peroxidases are classified into three major types based on protein structures and catalytic properties: class I (ascorbate peroxidase), class II (lignin peroxidase) and class III (secretory peroxidase) (Bhatt & Tripathi, 2011).

Plant materials
The cotton cytoplasmic male sterility line (JinA) and its maintainer line (MB177) were used to explore the patterns of temporal expression in flower buds. The sterility and maintainer line plants were planted in Shanxi Agricultural University. Flower buds of different stages (sporogonium stage, microsporocyte stage and meiosis stage) were collected and stored at −80 C for subsequent RNA extraction and qRT-PCR analysis. We used maintainer line (MB177) for gene silencing. The maintainer line plants were planted in artificial climate chamber (23 C, 70% relative humidity, photoperiod of 8 h darkness/16 h light) and cotton flower buds were collected at Microsporocyte stage for subsequent qRT-PCR and anthers ROS level analysis. The wild-type Arabidopsis thaliana Columbia (Col-0) was used for gene overexpression. They were planted in artificial climate chamber (22 C, 60% relative humidity, photoperiod of 8 h darkness/16 h light). The samples were collected at different development stage flower buds (6,7,8,9,10,11,12 stages) for subsequent anthers ROS level analysis.

Multiple alignments and phylogenetic analysis
The full amino acid sequences of PRXs from three cotton species and Arabidopsis were aligned using clustalW program (Thompson, Higgins & Gibson, 1994). The parameters for alignment by clustalW were as follows: gap opening penalty, 10; gap extension penalty, 0.2; protein weight matrix, gonet; residue-specific penalties, on; hydrophilic penalties, on; delay divergent cutoff (%): 30. A maximum likelihood (ML) phylogenetic tree was constructed using MEGA 7.0 program (Kumar, Stecher & Tamura, 2016) with bootstrap 1,000 repetitions and the Jones-Taylor-Thornton (JTT) model (Jones, Taylor & Thornton, 1992), then was drawn using EvolView (He et al., 2016). The PRXs in G. hirsutum were assigned to different subfamilies based on phylogenetic relationships and the classification of PRXs in the Arabidopsis (Tognolli et al., 2002).

Chromosome locations and gene structural of PRX genes in Gossypium hirsutum
The genome and the General Feature Format 3 (GFF3) files of G. hirsutum ) (TM-1 HAU_v1.1) were downloaded from CottonGen (https://www.cottongen.org/) (Yu et al., 2014). The GFF3 file contains annotation information for the location of genes, coding sequences (CDS), and untranslated regions (UTRs) in the genome. According to the genome and annotation files of G. hirsutum, we obtained and visualized the gene structure (exons/introns) information and chromosomal positions of PRXs using TBtools (Chen et al., 2020).

Gene duplication and calculation of Ka/Ks values
To identify the duplication events that occurred in GhPRXs of the G. hirsutum genome, the whole genome sequences of G. hirsutum were compared using the BLASTP program (e-value <1e −10 ) of TBtools (Chen et al., 2020). Then, the MCScanx  with default parameters was used to detect the duplication patterns including segmental and tandem duplication. TBtools (Chen et al., 2020) was used to visualized paralogous gene pairs. In order to understand the selection pressures during the expansion of GhPRX gene family, the nonsynonymous mutation rate (Ka), synonymous mutation rate (Ks), and Ka/Ks values of homologous gene pairs were calculated by KaKs Calculator program in TBtools (Chen et al., 2020).

Analysis of cis-acting element in promoters
Upstream region of 1,500 bp from the translation initiation codon ATG of GhPRXs were selected as the promoter and entered into the Plantcare website for promoter analysis (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) (Lescot et al., 2002).

Analysis of GO and KEGG pathway enrichment
For functional enrichment analysis, the Gene Ontology (GO) and Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed using the Omicshare tools (https://www.omicshare.com/tools), taking false discovery rate (FDR) ≤ 0.05 as a threshold. The genes in the background file used by GO and KEGG were all genes that have been annotated to the GO term and KEGG pathway, from cotton genetic improvement group of HuaZhong Agricultural University   (Table S1).

Expression pattern analysis
To further explore functions of GhPRXs in G. hirsutum, the RNA-seq data of eight different tissues (bract, petal, torus, root, leaf, stem, pistil, sepal and anther) of G. hirsutum (TM-1) were downloaded from the NCBI (https://www.ncbi.nlm.nih.gov/) (Sayers et al., 2021) (accession number: PRJNA490626). The transcriptome data of cytoplasmic male sterility line-Jin A and maintainer line MB177 flower buds were obtained from Shanxi Agricultural University cotton breeding laboratory (Yang, Han & Huang, 2014b). Trimmomatic (Bolger, Lohse & Usadel, 2014) was used to perform quality control and remove the adapters. Specifying parameters were as follows: adaptors were considered based on sequencing instrument as default, "SLIDINGWINDOW is 4:15 and minimum read length is 30 bp". Those remaining were aligned to the G. hirsutum  (TM-1 HAU_v1.1) genomes using the hisat2 program (Kim, Langmead & Salzberg, 2015), then Cufflinks (Ghosh & Chan, 2016) was used to calculate the fragments Per Kilobase of transcript per Million fragments (FPKM) values. The calculating parameters of Cufflinks were as follows: frag-bias-correct and multi-read-correct were used in this step. GhPRXs with FPKM > 1 were considered as expressed genes. TBtools software (Chen et al., 2020) was used to visualize the expression patterns of the GhPRXs based on the value of log 2 (FPKM + 1).

RNA extraction and qRT-PCR analysis
We used EASYspin plant RNA quick isolation kit (RN38; Aidlab Biotech, Beijing, China) to extract the whole RNA. The Takara Rverse Transcription kit (Japan) was used to generate the first cDNA strand. The specific primers were designed using the NCBI database (https://www.ncbi.nlm.nih.gov/) (Sayers et al., 2021) (Table S2). We used the CFX96 Real-Time PCR Detection System (Bio-Rad, Hercules, CA, USA) to test qRT-PCR. The final volume of 20 μL including 10 μL SYBR Green PCR mix (Takara, Maebashi, Japan), 1 μL of specific primers, 1 μL of cDNA and 7 μL of ddH 2 O. The reaction program was as follow: one cycle of 95 C for 30 s; 40 cycles of 94 C for 5 s; one cycle of 60 C for 30 s, and 40 cycles of 72 C for 30 s. The housekeeping EF-1a gene was used as the reference (Yang, Han & Huang, 2014b). The experiment was repeated three times. We calculated expression levels of GhPRX genes using 2 −ΔΔCt method (Rao, Lai & Huang, 2013). The statistical test was performed using t-tests (Livak & Schmittgen, 2001), and P < 0.05 was considered indicating a statistically significant difference ( Ã P < 0.05; ÃÃ P < 0.01).

VIGS
In order to explore the function of GhPRX107 in G. hirsutum, we used the tobacco rattle virus (TRV)-based vectors to preform virus-induced gene silencing (VIGS) (Pang et al., 2013). The specific primer was designed using SnapGene software (from Insightful Science; available at snapgene.com) (Table S3). We cloned the highly specific region of 400-bp from GhPRX107 into the EcoRI and BamHI sites of the TRV-based (pYL156) vector using ClonExpressÒ II One Step Cloning Kit (Vazyme Biotech Co, Ltd, Nanjing, China) to generate the TRV:GhPRX107 vector. The plasmid of TRV2:GhPRX107 and TRV2:00 vectors were transformed into Agrobacterium tumefaciens GV3101 and subsequently transformed into the maintainer lines-MB177 by cotyledon injection. The photobleaching phenotype which silenced the GhCLA1 gene in MB177 by VIGS was used as phenotype control. We used the qRT-PCR to detect the silencing efficiency of GhPRX107 gene in TRV2:GhPRX107 plant, the TRV2:00 lines were used as control. We analyzed anthers ROS (O 2 − and H 2 O 2 ) levels of microsporocyte stage in control (TRV2:00) and TRV2:GhPRX107 plants by staining with nitroblue terazolium (NBT) and 3,3-diaminobenzidine (DAB), respectively.

GhPRX107 overexpression in Arabidopsis
To further explore the function of GhPRX107, we manipulated GhPRX107 levels by overexpression of GhPRX107 in Arabidopsis. We cloned GhPRX107 coding sequence from anthers cDNA using the primer pair PRI-F/PRI-R (Table S3). It was integrated into the PRI-AN-101 vector at the EcoRI and XbaI sites using ClonExpressÒ II One Step Cloning Kit (Vazyme Biotech Co, Ltd, Nanjing, China) to generate the PRI-GhPRX107 vector. The plasmid of PRI-GhPRX107 was transformed into Agrobacterium tumefaciens GV3101 and subsequently transformed into wild type Arabidopsis by floar dip method (Pang et al., 2013). We compared anthers ROS (O 2 − and H 2 O 2 ) levels of different development stages (Sanders et al., 1999) between wild type and overexpression A.thaliana by staining with nitroblue terazolium (NBT) and 3,3-d iaminobenzidine (DAB), respectively.

Identification of PRX family members in Gossypium hirsutum, G. arboretum and G. raimondii
In order to identify the PRX family members in G. hirsutum, G. arboretum and G. raimondii, we used the 73 Arabidopsis PRX proteins (Tognolli et al., 2002) as a reference to search and blast proteins from the three cotton genomes. In addition, we used NCBI CDD (http://www.ncbi.nlm.nih.gov/cdd) (Lu et al., 2020) to confirm if every PRX member contained the complete conserved domain of PRXs. After eliminating redundant sequences, we identified a total of 166, 78 and 89 PRX genes in G. hirsutum, G. arboretum and G. raimondii, respectively (Table S4). The correspondinrg PRX genes were renamed based on the chromosomal locations.
The predicted isoelectric point (pI) and molecular weight of PRX proteins were 4.07-10.43 (MW, (Table S4), suggesting physical property differences between diploid cotton (G. arboretum and G. raimondii) and tetraploid cotton species (G. hirsutum). We observed that most PRX proteins were predicted to contain a signal peptide (294/333, 88.28%), which aligned with the properties of secreted proteins. The subcellular localization prediction results showed that 48.94% of PRXs located in chloroplasts, 27.92% in extracellular spaces, and the remaining 23.11% in cytoplasm, nucleus, vacuoles, and mitochondria (Table S4). The diverse set of predicted organelle locations implied different functions of PRX members.

Phylogenetic analysis
To understand the evolutionary relationships of the PRX gene family, we constructed a maximum-likelihood (ML) phylogenetic tree by repeating PRX proteins from G. hirsutum, G. arboretum, G. raimondii and Arabidopsis for 1,000 times. Combining the results of phylogenetic tree analysis with those of previous studies in Arabidopsis, we categorized PRXs into seven subfamilies (Fig. 1). Each subfamily contained PRX genes of the four species, indicating this gene family was conserved in different species during evolution. In addition, we observed that the homology of the PRX sequences was high among most of the PRXs derived from the At subgenome of the allotetraploid cotton (G. hirsutum) and the PRXs from G. arboretum. The PRX sequences from the Dt subgenome of G. hirsutum had high homology with the PRX genes from G. raimondii. This was consistent with the hypothesis that the allotetraploid cotton species came from the recombination of two diploid cotton species (Liu et al., 2015). The AtPRX9, AtPRX40 and GhPRX89 belonged to the same subfamily. Previous studies showed that GhPRX89 was involved in the male reproductive processes of cotton (Chen et al., 2009). AtPRX9 and AtPRX40 have been identified to be essential for the Arabidopsis anther development (Jacobowitz, Doyle & Weng, 2019). Collectively the results suggested that genes in the same subfamily share similar functions.

Chromosome locations of PRX genes in Gossypium hirsutum
Based on the annotated and sequencing information of G. hirsutum, we constructed a chromosomal map (Fig. 2), where 162 GhPRX genes were unevenly distributed on the 26 G. hirsutum chromosomes. Four PRX genes were found on the scaffold (Fig. 2). Among the chromosome-located genes, 79 and 83 were on the At-and Dt-subgenome chromosomes, respectively. The number of PRXs in allotetraploid cotton (G. hirsutum) was not equal to the diploid species (G. arboretum and G. raimondii), which is likely due to either gene loss in tetraploid species or gain in diploid species after polyploidization event. For the At subgenome, most genes were located on A05 (n = 11), and A13 had the least number of genes (n = 1); For the Dt subgenome, most gene were located on D12 (n = 12), and D06 had the least number of genes (n = 2). Additionally, we observed that some chromosome regions exhibited a relatively higher density of GhPRX genes, such as the bottoms of A09 and A12, and the tops of A05 and A08.

Gene structural analysis of GhPRXs
To further explore the structural diversity of the GhPRX genes, we analyzed the exons and introns of the 166 GhPRX genes. The numbers of GhPRX gene exons varied from 1-8. Most members contained four exons (104/166, 62.6%, Fig. 3B). We identified a conserved intron/exon gene structure for the GhPRX genes. More than half of the GhPRX genes had three introns and four exons (103/166, 62.6%), highly similar to the Arabidopsis PRX gene structure (Tognolli et al., 2002). However, the gene structures of 52 GhPRX family members were inconsistent with the three intron/four exon structure. Their intron numbers changed during evolution. However, in the same subfamilies, most members shared great similarity in gene structures and numbers of exons.

Gene duplication analysis of GhPRXs
Gene duplication, including tandem duplication and segmental duplication, is the main driving force in the evolution of genomes (Cannon et al., 2004). In this study, we identified 121 paralogous gene pairs in G.hirsutum by BLASTP and MCScanX. Among them, 100 included segmental duplications, while the remaining 21 were tandem duplications (Fig. 4). Segmental duplication was likely to be the main reason for the expansion of the GhPRX gene family. In general, Ka/Ks < 1 indicates negative or purifying selection, Ka/Ks = 1 stands for neutral selection, and Ka/Ks > 1 suggests positive selection. The Ka/Ks ratios of the GhPRX gene pairs were <1 except for the GhPRX134 and GhPRX135 gene pair (Fig. 4, Table S5), implying that these gene pairs underwent purifying selection.

Analysis of GhPRX promoters
The upstream promoter regions of genes contain cis-acting elements that regulate gene transcription. Here we analyzed the sequences of 1,500 bp upstream of the GhPRX genes using Plantcare (Lescot et al., 2002). Based on their putative functions, the cis-acting elements were categorized into three major groups, i.e., the hormone-responsive, stress-responsive and growth-responsive cis-regulatory groups (Fig. 3C, . Among the plant hormonal cis-acting regulatory elements, the number of GhPRXs having at least one ABA element was the largest, and the total number of MeJA elements across GhPRXs was the largest. More than half of the GhPRXs contained one to sixteen MeJA cis-acting regulatory elements. The stress-response element was the second largest category, which includes drought (MBS)-, low temperature (LTR) -and defense-response elements. In addition, we also found cis-acting elements involved in endosperm, circadian, seed specific, anaerobic, meristem and cell cycle regulation. In summary, these results suggested that GhPRXs play important roles in plant growth, development and responses to abiotic stresses.

GO and KEGG enrichment analysis of GhPRXs
To further understand the functions of GhPRXs, we performed functional enrichment annotations of gene ontology (GO) using FDR ≤ 0.05 as the cutoff. A total of 265 GO terms were obtained, 51 out of which were significantly enriched (Table S7). The top 20 significantly enriched terms were visualized using the Omicshare tools (Fig. 5).
The enriched gene ontology-biological processes (GO-BP) included the response to oxidative stress, and oxidation-reduction process and response. The gene ontology-molecular function (GO-MF) results showed the peroxidase activity, oxidoreductase activity and antioxidant activity enrichment. The gene ontology-cellular component (GO-CC) results suggested the GhPRX family genes were significantly enriched in the plant-type cell wall. Meanwhile, we carried out functional enrichment of Kyoto encyclopedia of genes and genomes (KEGG). We detected three signaling pathways in the KEGG analysis, among which the phenylpropanoid biosynthesis pathway was significantly enriched (Fig. 6, Table S7). Taken together, GhPRXs were involved in many biological processes, including the response to oxidative stress, peroxidase activity, and phenylpropanoid biosynthesis.

Expression patterns of GhPRXs
We investigated the expression patterns of GhPRX genes using publicly available RNA-seq data of nine different tissues (bract, petal, torus, root, leaf, stem, pistil, sepal and anther). Due to the large number of members in the GhPRX family, genes were divided into two groups (A and B) based on the subfamily for analysis, and the expression was shown as log 2 values (Fig. 7). The expression patterns of the A group (Fig. 7A) could be divided into five clades, namely clade 1 to 5. Clade 1 contained 29 genes with expression high in root. Clade 2 included 27 genes, including GhPRX143 and GhPRX152, which were not expressed in all the eight tissues. Clades 3 and 4 had 21 genes, most of which showed low expression levels in most tissues. The expression levels of clade 3 genes were lower than those of clade 4 genes. Clade 5 contained eight genes, which showed high expression levels in most tissues. Particularly, all the Clade 5 genes showed high expression levels in petals and anthers. The B group (Fig. 7B) showed similar expression patterns as the A group (Fig. 7A). GhPRX107 showed high expression levels in all the tissues, implying its essential  buds to explore the expression patterns of 166 GhPRX genes (Fig. 8). Using the same two groups as defined above, we detected the expression levels of seven genes were significantly different between the sterile and maintainer lines. Interestingly, the effect directions of the seven genes were different: GhPRX107, GhPRX27 and GhPRX99 were significantly up-regulated in the sterile line, while GhPRX44, GhPRX124, GhPRX48 and GhPRX128 were significantly down-regulated in the sterile line.
To validate the transcriptomics data, we selected two genes (GhPRX107 and GhPRX128) from the seven differentially expressed genes and performed qRT-PCR in the sterile (JinA) and maintainer line (MB177) flower buds that were collected at different stages (sporogonium stage, microsporocyte stage and meiosis stage). Consistent with RNA-seq data, GhPRX107 was significantly up-regulated in the sterile line flower buds at all the three stages, while GhPRX128 showed the opposite effect (Fig. 9). Collectively these results indicated that these seven PRX genes are closely related to pollen fertility.

Silencing GhPRX107 reduced ROS levels in microsporocyte-stage anthers
To explore the role of GhPRX107 in pollen fertility, we silenced GhPRX107 in cotton using virus-induced gene silencing (VIGS). After 11 days, the infected cotton leaves with TRV2: CLA1 showed photobleaching phenotype suggesting successful silencing of CLA1 (Fig. 10A). Using a similar strategy, we silenced GhPRX107 using TRV:GhPRX107. qRT-PCR results confirmed that GhPRX107 expression was significantly reduced in TRV: GhPRX107 plants compared to the control plants (TRV:00; Fig. 10B).
Previous studies showed that PRXs were not only oxidizing target substrates with H 2 O 2 , but also acting as key factors in producing ROS. To explore if GhPRX107 is associated with ROS production, we analyzed the ROS (O 2 − and H 2 O 2 ) levels of anthers at the microsporocyte stage between the GhPRX107-silenced and control plants by staining with NBT and DAB, respectively. We showed that ROS (O 2 − and H 2 O 2 ) levels were significantly decreased in the GhPRX107-silenced cotton plants compared to the control plants (Figs. 10C, 10D). Our results suggested an association between GhPRX107 expression levels and ROS levels in anthers.

GhPRX107 overexpression in Arabidopsis enhanced ROS levels in anthers
To further explore the role of GhPRX107 in male reproductive processes, we genetically transformed Arabidopsis using a GhPRX107 overexpression vector. NBT and DAB staining showed that the levels of superoxide anion (O 2 −) significantly increased around stages 6 and 7 in Arabidopsis anthers overexpressing GhPRX107. Hydrogen peroxide levels (H 2 O 2 ) also significantly increased from stages 7 to 9 compared with the wild type (Fig. 11). These results further suggested an association between GhPRX107 expression levels and ROS levels in anthers during microspore development.  (Tognolli et al., 2002), Populus, Oryza sativa , Maize , Pear (Cao et al., 2016) and Brachypodium distachyon  have been identified and analyzed.
In this study, we identified 166, 78 and 89 PRX genes in G. hirsutum, G. arboretum and G. raimondii, respectively. Compared with the PRX genes in Arabidopsis, we found more PRX genes in G. hirsutum. Although both Arabidopsis and G. hirsutum are dicotyledonous plants, the different degrees of polyploidy may be the main reason driving the different numbers of PRX genes in the two plants. G. hirsutum is allotetraploid with A and D genomes. Previous studies have shown that the Dt subgenome of G. hirsutum came from G. raimondii. The A2 genome of G. arboretum and the At subgenome of G. hirsutum may originate from a common ancestor (Du et al., 2018). The collinearities were largely conserved between the At subgenome and the A2 genome, and between the Dt subgenome and the D5 genome. Specifically~75.3% of the TM-1 At subgenome matched with 72.1% of the A2 genome in one-to-one syntenic blocks, and~78.1% of the TM-1 Dt subgenome matched with 85.6% of the D5 genome in one-to-one syntenic blocks . PRX genes corroborated this relationship among the three cotton species. We constructed a maximum likelihood (ML) phylogenetic tree of PRXs in G. hirsutum, G. arboretum, G. raimondii and Arabidopsis and showed that the PRXs of G. hirsutum can be divided into seven subfamilies with genes from the three cotton species contributing to each subfamily. Previous studies have divided Arabidopsis into five subfamilies (Tognolli et al., 2002). The reason for the inconsistency was that homologous genes of AtPRX47, AtPRX64, AtPRX66, AtPRX21 and AtPRX12, and AtPRX47, AtPRX66 and AtPRX66 were divided into two clades each, resulting in two extra subfamilies. We found that the number and physical properties of PRXs showed differences between diploid (G. arboretum and G. raimondii) and tetraploid cotton species (G. hirsutum) as well, suggesting independent evolution of their genomes. In a long-term evolutionary process, terminal repeats have made important contributions to the expansion of A genome scale, speciation and evolution . Abundant species-specific structural variations in gene regions have changed the expression of many important genes. Compared with G. raimondii, there were some unique structural variations in G. hirsutum, for example, there were the large fragment inversions in D09 chromosome and large inter-arm inversions in D12 chromosome. This indicated these variations occurred after polyploidization . Moreover, the species-specific gene families with relatively high proportion experienced more expansion or contraction in diploid D5 genome species . Therefore, different evolutionary pressures may be the reason for the differences between A2 genome and At subgenome, D5 genome and Dt subgenome.
Spatial and temporal expression patterns of PRX genes relate to their functions. Genome-wide gene expression analysis in Arabidopsis flowers showed that the members of PRX family were highly expressed in floral organs (Wellmer et al., 2004). Genes specifically or mainly expressed in plant floral organs were reported to be integral in floral organ development (Chen et al., 2009). PRX9 and PRX40 genes have been shown to be essential for normal Arabidopsis tapetum and microspore development (Jacobowitz, Doyle & Weng, 2019). Although cotton PRX Ghpod gene was found to be specifically expressed in flower buds and possibly involved in male development processes of angiosperms (Chen et al., 2009), no comprehensive expression patterns of PRX genes have been identified in different tissues and along male development processes.
Here we analyzed the expression patterns of GhPRX family genes using the transcriptomics data of nine tissues from a public database. We found many PRX members were highly expressed in anthers. More importantly, we investigated this gene family using our own transcriptomics data from a sterile line and a maintainer line. During flower bud development, we detected three GhPRX genes significantly up-regulated and four genes significantly down-regulated in the male sterile line. We validated our findings using qRT-PCR at three flower bud development stages. Therefore, we hypothesized that these genes played important roles during male reproductive processes of cotton.
Since ROS is closely related to male reproductive processes (Hu et al., 2011;Xie et al., 2014;Yang, Han & Huang, 2014a) and PRX functions, we carried out functional studies of one PRX gene-GhPRX107. We found that ROS contents of the microsporocyte-stage anthers from GhPRX107-silenced cotton plants were significantly decreased than that of controls. Overexpression of GhPRX107 in Arabidopsis significantly increased ROS levels in anthers. Taken together GhPRX107 is a determinant of ROS levels in anther.
Yunfang Qu conceived and designed the experiments, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft. Jinlong Zhang performed the experiments, analyzed the data, prepared figures and/or tables, and approved the final draft. Li Zhang performed the experiments, analyzed the data, prepared figures and/or tables, and approved the final draft. Dong Liang performed the experiments, analyzed the data, prepared figures and/or tables, and approved the final draft. Yujie Yang analyzed the data, prepared figures and/or tables, and approved the final draft. Jinling Huang conceived and designed the experiments, authored or reviewed drafts of the article, and approved the final draft.

Data Availability
The following information was supplied regarding data availability: The raw data is available in the Supplemental File.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.13635#supplemental-information.